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ABSTRACT 

Summary: Biospectrogam is an open-source software for the 
spectral analysis of DNA and protein sequences. The software 
can fetch (from NCBI server), import and manage biological data. 
One can analyze the data using Digital Signal Processing (DSP) 
techniques since the software allows the user to convert the symbolic 
data into numerical data using 23 popular encodings and then apply 
popular transformations such as Fast Fourier Transform (FFT) etc. 
and export it. The ability of exporting (both encoding files and 
transform files) as a MATLAB® .m file gives the user an option to 
apply variety of techniques of DSP. User can also do window analysis 
(both sliding in forward and backward directions and stagnant) with 
different size windows and search for meaningful spectral pattern with 
the help of exported MATLAB® file in a dynamic manner by choosing 
time delay in the plot using Biospectrogram. Random encodings and 
user choice encoding allows software to search for many possibilities 
in spectral space. 

Availability: Biospectrogam is written in Java® and is available 
to download freely from http://www.guptalab.org/biospectrogram 
Software has been optimized to run on Windows, Mac OSX and 
Linux. User manual and you-tube (product demo) tutorial is also 
available on the website. We are in the process of acquiring open 
source license for it. 
Contact: mankg@computer.org| 



1 INTRODUCTION 

Molecular biology has shown tremendous progress in the last 
decade because of various genome projects producing vast 
amount of biological data. This has resulted in Encode project 
(http://encodeproject.org) that classifies all the basic DNA elements 
of Human genome. This also gives us new insight into numerous 
molecular mechanism. In order to understand the digital biological 
data, people use different techniques from mathematics, computer 
science, etc. Digital signal processing (DSP) is a fundamental 
concept in information and communication technology (ICT). A 
natural question arises "Can DSP techniques help us to understand 
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the digital biology?" It turns out that the DSP techniques are playing 
a major role in biology and have given birth to a new branch called 
genomic signal processing ( [Shmulevich and Dougherty] [2007^ . To 
analyse the genomic data, researchers first convert the symbolic data 
(example DNA or protein data) into numerical data by applying a 
suitable map (|Kwan and Amiker] |2009[ | Amiker and Kwanl|2012| l 
and then by applying signal processing transforms such as Fourier 
etc. to study the desired biological properties (Lorenz o-Ginori etal.\ 
|2009) . In this work, we present a tool, Biospectrogram, which 
can help researchers to apply different encodings on the biological 
data and apply certain transformations to do the spectral analysis. 
User can also export the files (encoded or transformed) to popular 
MATLAB® software ([MATLAB [ [20 10|) to do the direct analysis. 



2 IMPLEMENTATION AND FEATURES 

The tool Biospectrogram has 4 major components viz. data 
collector, encode, transforms and export & plot. One can use the 
tool in DNA or Protein mode by using the switch button. The 
tool has two main windows viz display window for displaying the 
data (collected or encoded data) and work window (encoded or 
transform data) to show the work. Data collector module provides a 
direct fetching of DNA data (both fasta and genebank file formats) 
from National Center of Biotechnology Information (NCBI) server 
by taking accession number from user which can be encoded 
using encode button. User can also import the files from his own 
machine/network. One can also select a portion of the data from the 
window and do further processing. One popular encoding map is 
the Voss representation ( |Voss[ [1992^ which maps the nucleotides 
A, C, G, and T from DNA space into the four binary indicator 
sequences sjaM, a;c[«-], aJcM, and a^yfri] showing the presence 
(e.g. 1) or absence (e.g. 0) of the respective nucleotides. Similar 
indicator maps are available for protein space. Different possible 
encodings (23 available in our tool) and transformations (6 available 
in our tool) are shown in Figure [T] While applying encoding 
user has to select the fetched file from the first dropdown list 
and encoding scheme from the second dropdown list. The fasta 
file of the DNA sequence is shown in the display window and 
the encoded output is shown in the work window. After encoding 
the fetched DNA sequence or protein sequence, one can apply 
suitable transforms (see Figure [T|) available in the tool. To apply 



© Oxford University Press 2012. 



1 



Sample et al 




-Indicator Encoding {EOl.1.1/2/3/4 for A/C/Q/T) 
{Voss, 1992) 

-DV Curve Encoding {E01.4) (Zhang, 2009) 
-Real Value Encoding (EOS.l) {Cnstea, 2003; 

Rosen, 2006: Chakravarthy et al., 2004) 
-Real Value Encoding {User choice) (EOS. 2) 
-Electro Ion Encoding (E05.3) (Ning et al., 2003) 
-Random Real Value Encoding (E05.4) 
-Quaternary Integer Mapping 1 (E06.1) 

& 2 (E06.2) (Dan Cristea, 2003) 
-Protein Indicator Encoding {E07.1.1/2/3/4/5/6/ 
7/8/9/10/ll/12/13/14A5/16/17A8/19/20for 
AiC/D/E/F/G/H/l/K7L/M^N/P/Q/R/S/TA//W!'Y) 
& Protein Electro Ion Encodingi (E07.2) 
(Vaidyanathan, 2005) 

-Protein Real Value Encoding (User Choice) (E07.3) 
-Protein Random Real Value Encoding (E07.4) 



-Complex Encoding 1 (E02.1) (Anastassiou, 2001) 
-Complex Encoding (User choice) (E02.2) 
-Complex Encoding 2 (E02.3) (Rao and 
Shepherd, 2004) 

-Random Complex Encoding (E02.4) 



-Tetrahedron Encoding (E01.2) (Silverman and 
Linsker, 1986) 

-Z Curve Encoding (E01.3) (Zhang S. Zhang, 1994) 



-Graphical Encoding 1 (EOS.l) (Liao, 2005) 
-Graphical Encoding 2 (E03.2) (Yau et al., 2003) 
-Random Graphical Encoding (E03.3) 



-Quaternion Encoding 1 (E04.1) &i 2 (E04.2) 
(Brodzik &< Peters, 2005; Akhtar et al., 2007) 



■Fast Founer Transfbrm(TOl) 
■Z Transform (TO 3) 
■Chirp Z Transform(T06) 



■Hilbert Transtorm(T02) 
■Analytic Signal(T04) 
■Discrete HaarVfevelet 
Transform(T05) 



■Discrete Cosine Transform 
■Discrete Sine Transform 
■Discrete Hartley Transform 



-^'v'alsh-Hadamard Transforr 



-3D Fast Founer Transform 
■*( -3D Analytic Signa 



-2D Fast Founer Transform 
-2D Analytic Signa 
-2D Discrete Haar 
Vtay/elet Transform 



-Quaternion Fast Founer 
^ Transform 



Export 
and Plot 



Fig. 1. Basic architecture of Biospectrogram showing 4 major components (data collector, encode, transforms and export & plot). Different relationships 
between 23 encodings and transformations (with solid arrows possible in our tool) and others possible broken arrows using third party software MATLAB. 



other transforms (not available in our tool) and filters etc. one 
can export the encoded files to MATLAB® .m files and do the 
further analysis. For exhaustive search of a pattern, a window 
analysis can be done with our tool. The window button allows 
the user to set a window size while moving the window in both 
directions (forward and backward) using sliding window option 
whereas using stagnant window option user can select a portion 
of the sequence for the power spectrum from all its indicator 
sequences. By choosing appropriate delay time in the preferences, 
one can plot the transformation's output of our tool by exporting the 
transformation files to MATLAB® .m files and observe the signal 
in a smooth automatic manner with a delay of time set by user. 
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